- 
                Notifications
    You must be signed in to change notification settings 
- Fork 49
NETOBSERV-2075: Simplifying observed interfaces - moving it to main map #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
        
          
                bpf/flows.c
              
                Outdated
          
        
      | bpf_printk("error creating new observed_intf: %d\n", ret); | ||
| } | ||
| } | ||
| add_observed_intf(aggregate_flow, skb->ifindex, direction); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to acquire lock here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make interface update part of update_existing_flow()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, but I still take the lock in add_observed_intf for 2 reasons:
- we can't call a function while having a lock (so for instance the call to increase_counterisn't allowed)
- we can't read map value while having the lock, so all the for-loop to check for existing intf+direction isn't allowed during locking
 In other words we need to keep lock as short time as possible
93b677d    to
    0921905      
    Compare
  
    | New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=6970084 make set-agent-image | 
| New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=b0d6deb make set-agent-image | 
| u8 dscp; | ||
| u8 nb_observed_intf; | ||
| u8 observed_direction[MAX_OBSERVED_INTERFACES]; | ||
| u32 observed_intf[MAX_OBSERVED_INTERFACES]; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why u changed this instead of having array of struct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's padding optimization. With a struct there was a 3 bytes padding per struct instance, so an extra 18 bytes in total. Separated arrays reduce it to a minimum.
        
          
                bpf/flows.c
              
                Outdated
          
        
      | if (nb_observed_intf < MAX_OBSERVED_INTERFACES) { | ||
| for (u8 i = 0; i < nb_observed_intf; i++) { | ||
| if (value->observed_intf[i] == if_index) { | ||
| // Interface already seen -> skip (we don't check for direction to avoid reaching the max too frequently, as a tradeoff) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the impact of ignoring direction ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed that in slack. The result is not marking twice the same interface when it has a different direction. E.g. if a flow shows eth0/egress then eth0/ingress, only eth0/egress is going to be stored.
We may document that somewhere as a limitation ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as said in slack an alternative could be to introduce a new direction enum "BOTH", if we think it's important. It would take best of both worlds, ie. avoid reaching the max interface and still telling everything about what was observed. But it needs changes in the console plugin for display. I'll do that as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> https://issues.redhat.com/browse/NETOBSERV-2067
(I added it as a 1.8 tech-debt, although we'll see if it can make into 1.8 or be postponed to 1.9)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this introduce different user experience compared to what we shipped prev. i.e we need doc and qe awareness for automation updates etc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly theoretical, I don't think it involve any real loss in user experience - in fact, when I was seeing this the other day, I'm now pretty sure it was due to the non-IP flows that we're not capturing anymore.
Anyway, it's not a big deal to manage "both" directions so I've added that in the last commit. And on console side: netobserv/network-observability-console-plugin#695 (I generated fake data to be able to see it in console)
| static __always_inline void update_existing_flow(flow_metrics *aggregate_flow, pkt_info *pkt, | ||
| u64 len, u32 sampling, u32 if_index, | ||
| u8 direction) { | ||
| // Count only packets seen from the same interface as previously to avoid duplicate counts | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think u need to grap the lock here and free after the else and remove it from add_observred_intf to make sure not reading any value while it can be updated from another thead on different core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not possible, that's what I said here #509 (comment)
- when lock is acquired, no call function is allowed. Currently there's two call functions: add_observed_intfandincrease_counter
- reading map value isn't allowed while lock is acquired. For instance, the line if (value->observed_intf[i] == if_index)inadd_observed_intfgenerates an error
If we really wanted to, we could work that around, inlining everything and creating temporary copies of values, but I think it would just make it worse.
The rule of thumb is to keep the lock as little time as possible, which is what I think I'm doing here. If the value is read from another thread before the lock is acquired, I don't think it's a problem, it's just going to get the value before it's updated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if u use always_inline, the risk I see assume one thread added one interface and inc the count while another thread already looped over the prev counter and we end up overwritting it, is the error compiler err or verification err can u share ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the verifier, on startup:
Without inlining:
...ES) {\n\t1528: (25) if r1 > 0x5 goto pc+100 1629: frame1: R1=scalar(umin=6,umax=255,var_off=(0x0; 0xff)) R3=0 R5=1 R6=map_value(id=5,off=48,ks=40,vs=96,imm=0) R7=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R8=0 R9=map_value(id=5,off=0,ks=40,vs=96,imm=0) R10=fp0
 fp-32=????mm?? fp-40=00000000 fp-48= fp-56=00000000 fp-64=00000000 fp-72=mmmmmmmm fp-80=fp fp-88=mmmmmmmm fp-96=mmmmmmmm fp-104=mmmmmmmm fp-112=mmmmmmmm fp-120=mmmmmmmm fp-272=ctx fp-280=pkt fp-288=00000000 fp-296= fp-304=1 fp-312=00000000\n\t; bpf_printk(\"couldn't reserve space in the ringbuf. Dropping flow\");\n\t1629: (b7) r1 = 9
                     ; frame1: R1_w=9\n\t1630: (63) *(u32 *)(r10 -264) = r1    ; frame1: R1_w=9 R10=fp0 fp-264=9\n\t1631: (b7) r8 = 1                     ; frame1: R8_w=1\n\t; u32 initVal = 1;\n\t1632: (63) *(u32 *)(r10 -28) = r8     ; frame1: R8_w=1 R10=fp0 fp-32=mmmmmm??\n\t1633: (bf) r2 = r10
                   ; frame1: R2_w=fp0 R10=fp0\n\t;\n\t1634: (07) r2 += -264                 ; frame1: R2_w=fp-264\n\t; error_counter_p = bpf_map_lookup_elem(&global_counters, &key);\n\t1635: (18) r1 = 0xffff8b2008172c00    ; frame1: R1_w=map_ptr(off=0,ks=4,vs=4,imm=0)\n
\t1637: (85) call bpf_map_lookup_elem#1\n\tfunction calls are not allowed while holding a lock\n\tprocessed 482 insns (limit 1000000) max_states_per_insn 2 total_states 29 peak_states 29 mark_read 15" component=ebpf.FlowFetcher
time="2025-01-21T12:34:34Z" level=fatal msg="can't instantiate NetObserv eBPF Agent" error="loading and assigning BPF objects: field TcEgressFlowParse: program tc_egress_flow_parse: load program: invalid argument: function calls are not allowed while holding a lock (280 line(s) omitted)"
=> function calls are not allowed while holding a lock
So we need to refactor and/or use always_inline ... That's actually tricky because add_observed_intf includes increase_counter which contains helper calls (bpf_map_lookup_elem) that cannot be inlined, so in any case that would require some refactoring.
I mentioned a second error that I saw last time, but I don't reproduce it today, so not sure what it was exactly....
So ... let me try to clean up all that and see how that works...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u can make add_intf return an error and move the counter outside at the caller to avoid nested functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I don't understand why I had an error on reading the value previously, while holding the lock. It doesn't happen this time. => 61cf8a6
61cf8a6    to
    462b66e      
    Compare
  
    | if (value->observed_direction[i] != direction && | ||
| value->observed_direction[i] != OBSERVED_DIRECTION_BOTH) { | ||
| // Same interface seen on a different direction => mark as both directions | ||
| value->observed_direction[i] = OBSERVED_DIRECTION_BOTH; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this case issues in flp generating the other flp direction field ?
https://github.com/netobserv/flowlogs-pipeline/blob/main/pkg/pipeline/transform/transform_network_direction.go#L32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FLP doesn't use the interface direction to figure out the node direction, it does that only by comparing agent IP versus src/dest node IP
I don't think there's anything that reads the IfDirections field, except in the console for display...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the impact of using direction both when we filter flows on direction ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no impact precisely because I keep it separated from the direction enum that is used in filters.
The filters don't check anything in the observed_direction array of stored flows. So it's not impacted by this, and will continue to work simply with egress/ingress directions for each packet filtered individually.
| #define MAX_NETWORK_EVENTS 4 | ||
| #define MAX_OBSERVED_INTERFACES 4 | ||
| #define MAX_OBSERVED_INTERFACES 6 | ||
| #define OBSERVED_DIRECTION_BOTH 3 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have enum used to define possible directions https://github.com/netobserv/netobserv-ebpf-agent/blob/main/bpf/types.h#L75
so u can add both there and move MAX to 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to change the enum to not mess up anything with the filters, which uses that enum. This value is very specific to the "observed direction" thing
| /lgtm | 
| New images: These will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=c741005 make set-agent-image | 
| @jotak It's hard to tell deterministically whether we're seeing improvements because these errors are still expected:   in above I am using query:  | 
| @memodi weird .. when I tested it, they almost disappeared. There might have been something regressed since my first commits, let me re-investigate | 
Observed interfaces is a structure managed from the TC hook, so it is elligible for being written in the main map instead of the additional metrics map. Moreover, it is very frequently created (the vast majority of flows being observed from at least two interfaces), so having it in the additional maps sort of defeats one of its purpose, that was allowing to reduce the memory used by separating commonly flow data from more uncommon data. In other words, when this is set into the addtional map, the size map becomes similar to main map, whereas if removed the size map can be shrinked to a much lower number. lock for observed interface update Increase max interfaces.. ... and ignore direction for duplicate interface checks (tradeoff to minimize loss of data in case of max reached) Also some padding optimisation
| New images: These will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=c97c4e6 make set-agent-image | 
| New images: These will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=ab76445 make set-agent-image | 
        
          
                bpf/flows.c
              
                Outdated
          
        
      | "proto=%d, sport=%d, dport=%d\n", | ||
| if_index, aggregate_flow->eth_protocol, pkt->id->transport_protocol, | ||
| pkt->id->src_port, pkt->id->dst_port); | ||
| if (pkt->id->transport_protocol != 0 && | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
icmpv4/v6 won't work with this check ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, anything with both ports = 0 won't report that as an error, it will limit the interfaces silently - I'm trying to place the cursor between making this issue too visible and too hidden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't protocol != 0 should be good enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO there's good chances that we'll see the error again in high density clusters. Now that I've added a severity label, maybe that's ok, @memodi can filter out low severity errors if that makes regression tests too flaky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
quick tests on my cluster show that it's sufficient indeed, we'll see if that's good enough for @memodi too
| @jotak: This pull request references NETOBSERV-2075 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.19.0" version, but no target version was set. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| New images: These will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=26f4cae make set-agent-image | 
| /lgtm | 
| thanks @jotak , with latest image seeing huge improvements, I'll also make change in perf testing to only look for {severity!=low} agent errors   | 
| /label qe-approved | 
| @jotak: This pull request references NETOBSERV-2075 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.19.0" version, but no target version was set. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| /approve | 
| [APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jotak The full list of commands accepted by this bot can be found here. The pull request process is described here 
Needs approval from an approver in each of these files:
 
 Approvers can indicate their approval by writing  | 
33abbd2
      into
      
  
    netobserv:main
  
    …ap (netobserv#509) * Improve performances by moving observed interfaces to main map Observed interfaces is a structure managed from the TC hook, so it is elligible for being written in the main map instead of the additional metrics map. Moreover, it is very frequently created (the vast majority of flows being observed from at least two interfaces), so having it in the additional maps sort of defeats one of its purpose, that was allowing to reduce the memory used by separating commonly flow data from more uncommon data. In other words, when this is set into the addtional map, the size map becomes similar to main map, whereas if removed the size map can be shrinked to a much lower number. lock for observed interface update Increase max interfaces.. ... and ignore direction for duplicate interface checks (tradeoff to minimize loss of data in case of max reached) Also some padding optimisation * Discard non-IP flows * Include all updates in single lock block * Manage case with Both directions format * Do not report errors on proto/port 0 * Add severity on errors * Still report errors on 0-ports (e.g. icmp)
…ap (netobserv#509) * Improve performances by moving observed interfaces to main map Observed interfaces is a structure managed from the TC hook, so it is elligible for being written in the main map instead of the additional metrics map. Moreover, it is very frequently created (the vast majority of flows being observed from at least two interfaces), so having it in the additional maps sort of defeats one of its purpose, that was allowing to reduce the memory used by separating commonly flow data from more uncommon data. In other words, when this is set into the addtional map, the size map becomes similar to main map, whereas if removed the size map can be shrinked to a much lower number. lock for observed interface update Increase max interfaces.. ... and ignore direction for duplicate interface checks (tradeoff to minimize loss of data in case of max reached) Also some padding optimisation * Discard non-IP flows * Include all updates in single lock block * Manage case with Both directions format * Do not report errors on proto/port 0 * Add severity on errors * Still report errors on 0-ports (e.g. icmp)
Observed interfaces is a structure managed from the TC hook, so it is elligible for being written in the main map instead of the additional metrics map. Moreover, it is very frequently created (the vast majority of flows being observed from at least two interfaces), so having it in the additional maps sort of defeats one of its purpose, that was allowing to reduce the memory used by separating commonly flow data from more uncommon data.
In other words, when this is set into the addtional map, the size map becomes similar to main map, whereas if removed the size map can be shrinked to a much lower number.
Also:
Description
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.